Reduction of Dutch Sentences for Automatic Subtitling

نویسندگان

  • Erik F. Tjong Kim Sang
  • Walter Daelemans
  • Anja Höthker
چکیده

We compare machine learning approaches for sentence length reduction for automatic generation of subtitles for deaf and hearing-impaired people with a method which relies on hand-crafted deletion rules. We describe building the necessary resources for this task: a parallel corpus of examples of news broadcasts of the Flemish VRT broadcasting corporation, and a Dutch shallow parser based on the material of the Spoken Dutch Corpus (CGN). We evaluate the sentence simplifiers and discuss their performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Sentence Simplification for Subtitling in Dutch and English

We describe ongoing work on sentence summarization in the European MUSA project and the Flemish ATraNoS project. Both projects aim at automatic generation of TV subtitles for hearing-impaired people. This involves speech recognition, a topic which is not covered in this paper, and summarizing sentences in such a way that they fit in the available space for subtitles. The target language is equa...

متن کامل

Sentence Compression For Automatic Subtitling

This paper investigates sentence compression for automatic subtitle generation using supervised machine learning. We present a method for sentence compression as well as discuss generation of training data from compressed Finnish sentences, and different approaches to the problem. The method we present outperforms state-of-the-art baseline in both automatic and human evaluation. On real data, 4...

متن کامل

STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools

We present a modular video subtitling platform that integrates speech/non-speech segmentation, speaker diarisation, language identification, Dutch speech recognition with state-of-the-art acoustic models and language models optimised for efficient subtitling, appropriate preand postprocessing of the data and alignment of the final result with the video fragment. Moreover, the system is able to ...

متن کامل

Automatic Classification of Sentences in Dutch Laws

The work described here builds on [1], where we presented a categorisation of norms or provisions in legislation. We claimed that the categories are characterized by the use of typical sentence structures and that this would enable automatic detection and classification. In this paper we present the results of experiments in such automatic classification of provisions. We have defined fourteen ...

متن کامل

Intralingual Open Subtitling in Flanders: Audiovisual Translation, Linguistic Variation and Audience Needs

This article presents an overview of the main findings of an interdisciplinary research project carried out by scholars from a department of translation and interpreting, a department of communication science and a department of linguistics. The project investigated Dutch open subtitling of native speakers of either northern Dutch or a Flemish (regional) variant of Dutch on Flemish television. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003